significance test (see Chapter 3 for an introduction to statistical power).
Do you want to estimate the value of the slope (or intercept) to within a certain margin of
error? If so, you want to calculate the sample size required to achieve a certain precision in your
estimate.
Testing the statistical significance of a slope is exactly equivalent to testing the statistical significance
of a correlation coefficient, so the sample-size calculations are also the same for the two types of tests.
If you haven’t already, check out Chapter 15, which contains guidance and formulas to estimate how
many participants you need to test for any specified degree of correlation.
If you’re using regression to estimate the value of a regression coefficient — for example, the slope of
the straight line — then the sample-size calculations become more complicated. The precision of the
slope depends on several factors:
The number of data points: More data points give you greater precision. SEs vary inversely with
the square root of the sample size. Alternatively, the required sample size varies inversely with the
square of the desired SE. So, if you quadruple the sample size, you cut the SE in half. This is a
very important and generally applicable principle.
Tightness of the fit of the observed points to the line: The closer the data points hug the line, the
more precisely you can estimate the regression coefficients. The effect is directly proportional, in
that twice as much Y-scatter of the points produces twice as large a SE in the coefficients.
How the data points are distributed across the range of the X variable: This effect is hard to
quantify, but in general, having the data points spread out evenly over the entire range of X
produces more precision than having most of them clustered near the middle of the range.
Given these factors, how do you strategically design a study and gather data for a linear regression
where you’re mainly interested in estimating a regression coefficient to within a certain precision?
One practical approach is to first conduct a study that is small and underpowered, called a pilot study,
to estimate the SE of the regression coefficient. Imagine you enroll 20 participants and measure them,
then create a regression model. If you’re really lucky, the SE may be as small as you wanted, or even
smaller, so you know if you conduct a larger study, you will have enough sample.
But the SE from a pilot study usually isn’t small enough (unless you’re a lot luckier that we’ve
ever been). That’s when you can reach for the square-root law as a remedy! Follow these steps to
calculate the total sample size you need to get the precision you want:
1. Divide the SE that you got from your pilot study by the SE you want your full study to
achieve.
2. Take the square of this ratio.
3. Multiply the square of the ratio by the sample size of your pilot study.
Imagine that you want to estimate the slope to a precision or SE of ±5. If a pilot study of 20
participants gives you a SE of ±8.4 units, then the ratio is
, which is 1.68. Squaring this ratio